Automated Extraction of Tags from the Penn Treebank
نویسندگان
چکیده
The accuracy of statistical parsing models can be improved with the use of lexical information. Statistical parsing using Lexicalized tree adjoining grammar (LTAG), a kind of lexicalized grammar, has remained relatively unexplored. We believe that is largely in part due to the absence of large corpora accurately bracketed in terms of a perspicuous yet broad coverage LTAG. Our work attempts to alleviate this diiculty. We extract diierent LTAGs from the Penn Treebank. We show that certain strategies yield an improved extracted LTAG in terms of compactness, broad coverage, and supertagging accuracy. Furthermore, we perform a preliminary investigation in smoothing these grammars by means of an external linguistic resource, namely, the tree families of an XTAG grammar, a hand built grammar of English.
منابع مشابه
Nondeterministic LTAG Derivation Tree Extraction
In this paper we introduce a naive algorithm for nondeterminisctic LTAG derivation tree extraction from the Penn Treebank and the Proposition Bank. This algorithm is used in the EM models of LTAG Treebank Induction reported in (Shen and Joshi, 2004). Given the trees in the Penn Treebank with PropBank tags, this algorithm generates shared structures that allow efficient dynamic programming in th...
متن کاملMorphological Features for Parsing Morphologically-rich Languages: A Case of Arabic
We investigate how morphological features in the form of part-of-speech tags impact parsing performance, using Arabic as our test case. The large, fine-grained tagset of the Penn Arabic Treebank (498 tags) is difficult to handle by parsers, ultimately due to data sparsity. However, ad-hoc conflations of treebank tags runs the risk of discarding potentially useful parsing information. The main c...
متن کاملIdentifying Verb Arguments and their Syntactic Function in the Penn Treebank
In this paper, we present a tool that allows one to automatically extract verb argument-structure from the Penn Treebank as well as from other corpora annotated with the Penn Treebank release 2 conventions. More specifically, we examine each possible sequence of tags, both functional and categorial and determine whether such a sequence indicates an obligatory argument, an optional argument or a...
متن کاملParsing Arabic Using Treebank-based Lfg Resources
In this paper we present initial results on parsing Arabic using treebank-based parsers and automatic LFG f-structure annotation methodologies. The Arabic Annotation Algorithm (A) (Tounsi et al., 2009) exploits the rich functional annotations in the Penn Arabic Treebank (ATB) (Bies and Maamouri, 2003; Maamouri and Bies, 2004) to assign LFG f-structure equations to trees. For parsing, we modify ...
متن کاملSense Tagging the Penn Treebank
This paper describes the methodology that is being used to augment the Penn Treebank annotation with sense tags and other types of semantic information. Inspired by the results of SENSEVAL, and the high inter-annotator agreement that was achieved there, similar methods were used for a pilot study of 5000 words of running text from the Penn Treebank. Using the same techniques of allowing the ann...
متن کامل